Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results

نویسندگان

  • Yonatan Belinkov
  • James R. Glass
چکیده

Machine translation between Arabic and Hebrew has so far been limited by a lack of parallel corpora, despite the political and cultural importance of this language pair. Previous work relied on manually-crafted grammars or pivoting via English, both of which are unsatisfactory for building a scalable and accurate MT system. In this work, we compare standard phrase-based and neural systems on Arabic-Hebrew translation. We experiment with tokenization by external tools and subword modeling by character-level neural models, and show that both methods lead to improved translation performance, with a small advantage to the neural models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine Translation between Hebrew and Arabic: Needs, Challenges and Preliminary Solutions

Modern Hebrew and Modern Standard Arabic, both Semitic languages, share many orthographic, lexical, morphological, syntactic and semantic similarities, but they are still not mutually comprehensible. Most native Hebrew speakers in Israel do not speak Arabic, and the vast majority of Arabs (outside Israel) do not speak Hebrew. Machine translation (MT) between these two language has the potential...

متن کامل

A Hebrew verb-complement dictionary

We present a verb-complement dictionary of Modern Hebrew, automatically extracted from text corpora. Carefully examining a large set of examples, we defined ten types of verb complements that cover the vast majority of the occurrences of verb complements in the corpora. We explored several collocation measures as indicators of the strength of the association between the verb and its complement....

متن کامل

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

Corpus-based Explorations of Affective Load Differences in Arabic-Hebrew-English

This work is about connotative aspects of words, often not carried over in translation, which depend on specific cultures. A cross-language computational study is presented, based on exploitation of similarity techniques on large corpora of news documents in English, Arabic, and Hebrew. In particular, focus of the exploration is on specific terms expressing emotion, negotiation and conflict.

متن کامل

Segmentation for English-to-Arabic Statistical Machine Translation

In this paper, we report on a set of initial results for English-to-Arabic Statistical Machine Translation (SMT). We show that morphological decomposition of the Arabic source is beneficial, especially for smaller-size corpora, and investigate different recombination techniques. We also report on the use of Factored Translation Models for Englishto-Arabic translation.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1609.07701  شماره 

صفحات  -

تاریخ انتشار 2016